Improved Spectral-Norm Bounds for Clustering

نویسندگان

  • Pranjal Awasthi
  • Or Sheffet
چکیده

Aiming to unify known results about clustering mixtures of distributions under separation conditions, Kumar and Kannan [KK10] introduced a deterministic condition for clustering datasets. They showed that this single deterministic condition encompasses many previously studied clustering assumptions. More specifically, their proximity condition requires that in the target k-clustering, the projection of a point x onto the line joining its cluster center μ and some other center μ, is a large additive factor closer to μ than to μ. This additive factor can be roughly described as k times the spectral norm of the matrix representing the differences between the given (known) dataset and the means of the (unknown) target clustering. Clearly, the proximity condition implies center separation – the distance between any two centers must be as large as the above mentioned bound. In this paper we improve upon the work of Kumar and Kannan [KK10] along several axes. First, we weaken the center separation bound by a factor of √ k, and secondly we weaken the proximity condition by a factor of k (in other words, the revised separation condition is independent of k). Using these weaker bounds we still achieve the same guarantees when all points satisfy the proximity condition. Under the same weaker bounds, we achieve even better guarantees when only (1−ǫ)-fraction of the points satisfy the condition. Specifically, we correctly cluster all but a (ǫ + O(1/c))-fraction of the points, compared to O(kǫ)-fraction of [KK10], which is meaningful even in the particular setting when ǫ is a constant and k = ω(1). Most importantly, we greatly simplify the analysis of Kumar and Kannan. In fact, in the bulk of our analysis we ignore the proximity condition and use only center separation, along with the simple triangle and Markov inequalities. Yet these basic tools suffice to produce a clustering which (i) is correct on all but a constant fraction of the points, (ii) has k-means cost comparable to the k-means cost of the target clustering, and (iii) has centers very close to the target centers. Our improved separation condition allows us to match the results of the Planted Partition Model of McSherry [McS01], improve upon the results of Ostrovsky et al [ORSS06], and improve separation results for mixture of Gaussian models in a particular setting. ∗An extended abstract of this work appears in APPROX-RANDOM 2012 †This work was supported in part by the National Science Foundation under grant CCF-0830540, IIS-1065251, and CCF-1116892 as well as by CyLab at Carnegie Mellon under grants DAAD19-02-1-0389 and W911NF-09-1-0273 from the Army Research Office.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Some new perturbation bounds of generalized polar decomposition

Some new perturbation bounds of the positive (semi) definite polar factor and the (sub) unitary polar factor for the (generalized) polar decomposition under the general unitarily invariant norm and the spectral norm are presented. By applying our new bounds to the weighted cases, the known perturbation bounds for the weighted polar decomposition are improved. 2014 Elsevier Inc. All rights reser...

متن کامل

Lecture 4 : Concentration and Matrix Multiplication

Today, we will continue with our discussion of scalar and matrix concentration, with a discussion of the matrix analogues of Markov’s, Chebychev’s, and Chernoff’s Inequalities. Then, we will return to bounding the error for our approximating matrix multiplication algorithm. We will start with using Hoeffding-Azuma bounds from last class to get improved Frobenius norm bounds, and then (next time...

متن کامل

Perturbation bounds for $g$-inverses with respect to the unitarily invariant norm

Let complex matrices $A$ and $B$ have the same sizes. Using the singular value decomposition, we characterize the $g$-inverse $B^{(1)}$ of $B$ such that the distance between a given $g$-inverse of $A$ and the set of all $g$-inverses of the matrix $B$ reaches minimum under the unitarily invariant norm. With this result, we derive additive and multiplicative perturbation bounds of the nearest per...

متن کامل

Tensor sparsification via a bound on the spectral norm of random tensors

Given an order-d tensor A ∈ Rn×n×...×n, we present a simple, element-wise sparsification algorithm that zeroes out all sufficiently small elements of A, keeps all sufficiently large elements of A, and retains some of the remaining elements with probabilities proportional to the square of their magnitudes. We analyze the approximation accuracy of the proposed algorithm using a powerful inequalit...

متن کامل

Sharp Bounds on the PI Spectral Radius

In this paper some upper and lower bounds for the greatest eigenvalues of the PI and vertex PI matrices of a graph G are obtained. Those graphs for which these bounds are best possible are characterized.

متن کامل

Lower Bounds for the Spectral Norm

Let A be a complex m × n matrix. We find simple and good lower bounds for its spectral norm ‖A‖ = max{ ‖Ax‖ | x ∈ C, ‖x‖ = 1 } by choosing x smartly. Here ‖ · ‖ applied to a vector denotes the Euclidean norm.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012